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Statement  of  Objectives 

Currently,  a  Russian  commercially  available  software  named  IOSO  is  the  most  efficient  and 
the  most  robust  multi-objective  optimization  software.  Similarly,  the  second  best  multi-objective 
optimization  software  package,  modeFrontier,  was  also  developed  in  a  foreign  country  (Italy) 
and  is  licensed  by  their  software  company.  The  only  multi-objective  optimization  software 
package  that  was  developed  in  the  United  States  and  is  licensed  by  the  U.S.  software  company 
is  Engineous.  However,  performance  speed,  robustness  and  overall  sophistication  of  Engineous 
optimizer  are  significantly  inferior  in  comparison  to  modeFrontier  and  especially  to  IOSO 
performance  speed,  versatility  and  robustness.  Performance  of  IOSO,  modeFrontier  and 
Engineous  multi-objective  optimizers  published  in  the  open  literature  clearly  indicates  that 
IOSO,  which  involves  concepts  of  neural  networks,  radial  basis  functions,  and  self-adapting 
response  surface  methodologies,  requires  the  minimum  number  of  the  objective  function 
evaluations  and  that  it  is  the  most  versatile  and  robust  multi-objective  optimizer. 

Based  on  our  experience  in  developing  hybrid  single-objective  constrained  optimization 
algorithms,  the  objective  of  this  effort  was  to  develop  a  hybrid  multi-objective  constrained 
optimization  algorithm  that  will  be  robust  and  very  fast,  thus,  emulating  some  of  the  main 
advantages  of  IOSO  algorithm. 

The  proposed  hybrid  robust  multi-objective  optimizer  was  developed  to: 

1.  utilize  several  evolutionary  optimization  algorithms,  a  set  of  rules  for  automatic  switching 
among  these  algorithms  in  order  to  accelerate  the  overall  convergence  and  avoid 
termination  in  a  local  minimum, 

2.  involve  development  of  algorithms  for  multi-dimensional  response  surfaces 
(metamodels)  that  are  fast,  accurate  and  robust  by  utilizing  wavelet-based  artificial 
neural  networks,  polynomials  of  radial  basis  functions  and  self-adapting  maps, 

3.  involve  an  algorithm  based  on  Bayesian  statistics  (using  Kalman  filters,  Markov  chains, 
etc.)  that  will  enhance  robustness  of  the  multi-objective  optimization  algorithm  by 
accounting  for  uncertainties  in  the  input  data  and  in  the  accuracy  of  the  evaluation 
methods  for  the  multiple  objective  functions. 

The  proposed  hybrid  evolutionary  multi-objective  optimization  algorithm  was  also  to  be 
thoroughly  tested  on  a  number  of  standard  test  problems  with  two  and  three  simultaneous 
objectives  where  the  Pareto  surface  could  be  continuous  and  discontinuous. 

The  proposed  robust  hybrid  optimizer  was  to  be  programmed  in  such  a  way  that  it  can  be 
transportable  to  any  single-processor  or  a  parallel  processor  computing  platform. 
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Summary  of  Significant  Results  and  Recommendations  for  Future  Research 

One  of  the  main  conclusions  of  this  research  is  that  any  new  multi-objective  optimization 
algorithms  should  involve  an  efficient  and  reliable  interaction  of  the  following: 

•  A  combination  of  several  multi-objective  evolutionary  optimization  algorithms, 

•  A  fast  and  accurate  algorithm  for  fitting  large  dimensionality  response  surfaces  using 
very  small  data  sets, 

•  An  algorithm  that  can  efficiently  and  reliably  account  for  statistical  noise  (uncertainty) 

In  other  words,  even  the  best  possible  multi-objective  evolutionary  optimization  algorithm  for 
one  class  of  problems  cannot  be  expected  to  perform  better  than  any  other  optimization 
algorithm  for  all  classes  of  optimization  problems.  Therefore,  a  combination  of  several 
optimization  algorithms  should  be  used  in  order  to  assure  reliability  of  the  evolutionary  multi¬ 
objective  optimization.  Results  presented  in  this  report  confirm  that  MOHO  is  one  such 
optimization  concept  that  works. 

Multi-dimensional  response  surfaces  (metamodels)  are  found  to  be  the  key  to  efficiency  and 
feasibility  of  any  realistic  multi-objective  optimization  where  only  small  data  sets  are  available, 
because  of  the  very  high  cost  of  experimentally  or  computationally  evaluating  the  objective 
functions.  Among  the  several  methods  developed  in  this  project  for  automatic  generation  of 
response  surfaces,  only  those  methods  that  offer  consistently  high  accuracy  at  a  relatively  low 
computing  cost  are  attractive.  This  is  especially  becoming  critical  with  the  increase  in  the 
number  of  the  design  variables  (dimensions  of  a  response  surface)  as  industry  is  demanding 
computationally  efficient  and  accurate  response  surface  algorithms  for  optimization  of  realistic 
problems  where  numbers  of  design  variables  run  into  hundreds  and  soon  into  thousands. 

Comparative  analysis  of  the  three  methods  that  were  developed  in  this  project  (wavelet 
based  neural  networks,  polynomials  of  radial  basis  functions,  and  multi-layer  self-assembling 
concept)  clearly  points  at  the  following  conclusions: 

WNN  (wavelet  based  neural  networks)  should  be  used  only  for  relatively  low  dimensional 
response  surface  fits  where  the  number  of  design  variables  is  less  than  approximately  35.  For 
higher  dimensional  response  surfaces,  the  accuracy  of  this  algorithm  rapidly  decreases  and  the 
computing  costs  rapidly  increases.  Also,  this  method  does  not  give  accurate  results  for  small 
and  scarce  data  sets. 

Polynomials  of  radial  basis  functions  are  very  accurate  and  computationally  fast  for  fitting 
response  surfaces  having  up  to  approximately  1000  design  variables.  For  problems  having  a 
larger  number  of  design  variables  (dimensions  of  the  response  surface),  the  matrices  that  need 
to  be  solved  become  exceedingly  ill-conditioned  which  leads  to  deterioration  of  accuracy  of  the 
fitting.  At  the  same  time,  the  computing  costs  for  solving  these  large  matrices  start  increasing 
rapidly.  This  method  gives  good  results  even  for  small  data  sets  and  reasonable  results  for 
scarce  data  sets. 

Response  surfaces  generation  using  a  hybridized  multi-layer  self-organizing  was  developed 
with  a  capability  to  use  either  linear,  quadratic,  cubic,  or  quartic  local  polynomials  at  each  node 
of  each  level  of  the  multi-layer  tree.  It  was  found  that  the  most  accurate  approach  is  to  use  a 
lower  order  local  fitting  (typically  using  quadratic  polynomials)  followed  by  a  polynomial  radial 
basis  function  fitting  the  resulting  error.  This  method  for  generating  multi-dimensional  surfaces  is 
quite  computationally  expensive  for  lower  dimensionality  response  surfaces.  However,  this 
method  is  the  most  promising  response  surface  fitting  method  for  very  large  dimensionality 
surfaces  as  its  computational  cost  increases  approximately  linearly  with  the  number  of  design 
variables.  This  method  is  capable  of  giving  good  results  for  small  and  even  scarce  data  sets. 

Uncertainty  evaluation  of  initial  data  and  consequently  of  the  output  results  was  developed 
using  Bayesian  statistics  based  on  Kalman  filters  for  linear  problems  and  on  Markov  chains 
Monte  Carlo  (MCMC)  filters  for  nonlinear  problems.  The  MCMC  method  was  proven  to  be 
significantly  more  accurate  and  robust,  but  it  is  computationally  one  order  of  magnitude  costlier. 
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Multi-Objective  Hybrid  Optimization  (MOHO)  With  Automatic  Switching  Among 
Individual  Search  Algorithms 

The  MOHO  software  [1,2,3]  that  was  developed  as  a  part  of  this  effort  is  a  high  level  relay 
hybrid  metaheuristic  algorithm  [4],  In  its  current  version,  three  different  evolutionary  multi¬ 
objective  search  algorithms  are  coordinated  and  applied  in  order  to  expedite  the  search  for  a 
Pareto  front.  Like  many  other  evolutionary  algorithms,  MOHO  runs  in  steps  of  population 
generations.  At  each  generation  the  algorithm  that  is  selected  makes  a  new  generation  using 
any  or  all  of  the  information  provided  to  it:  the  last  generation’s  population  and  the  latest  non- 
dominated  set.  Then,  the  MOHO  algorithm  combines  the  new  generation  and  the  latest  non- 
dominated  set  to  create  a  new  non-dominated  set.  MOHO  keeps  track  of  this  process  to  detect 
five  possible  improvements  to  the  dominated  set  (the  Pareto  approximation).  If  the  particular 
search  algorithm  can  achieve  at  least  two  of  any  of  the  five  improvements,  this  algorithm  is 
allowed  to  create  the  next  generation.  MOHO  has  been  successfully  run  on  Microsoft  Windows 
workstations,  Linux  workstations  and  Beowulf  style  clusters.  The  software  can  run  in  two 
modes;  serial  and  parallel.  In  serial  mode  the  optimization  and  objective  function  evaluation 
occur  on  the  same  processor.  In  parallel  mode,  the  optimizer  runs  on  a  processor  (the  cluster 
server  in  this  case)  and  the  objective  function  evaluations  are  then  made  into  jobs  that  are 
submitted  to  the  job  manager  of  the  cluster  in  question.  This  architecture  is  preferred  by  the 
authors  to  allow  for  large  parallelized  computational  fluid  dynamics  and  finite  element  analysis 
based  objective  function  calls  that  consume  significant  amounts  of  computing  time. 

MOHO  uses  three  multi-objective  optimization  algorithms: 

1 .  Strength  Pareto  Evolutionary  Algorithm  (SPEA-2)  by  Zitzler  et  a/.  [5], 

2.  Multi-objective  implementation  [6]  of  the  single  objective  Particle  Swarm  (referred  to  as 

MOPSO  here)  by  Eberhardt  et  al.  [7], 

3.  Non-Sorting  Differential  Evolution  (NSDE)  algorithm  which  is  a  low  level  hybrid  metaheuristic 

search  combining  NSGA-II  by  Deb  et  al.  [8]  and  Differential  Evolution  by  Storn  and  Price  [9], 

Each  of  these  constituent  search  algorithms  was  modified,  if  needed,  to  fit  into  MOHO's 
system  of  maintaining  the  non-dominated  set  and  clustering  outside  of  the  search  algorithms. 
Also  each  search  algorithm  was  set  up  to  be  able  to  accept  populations  and  non-dominated  sets 
in  a  generalized  format  to  allow  the  hybridization  to  run  smoothly.  These  conditions  do  not 
adversely  affect  the  application  of  the  individual  search  algorithms. 

SPEA-2  was  utilized  in  the  following  manner  for  this  work: 

1.  A  population  P  and  a  non-dominated  set  P’  are  handed  to  the  algorithm  from  the 
centralized  part  of  the  MOHO  software. 

2.  Then,  the  fitness  is  calculated  for  members  of  P  and  P'. 

3.  Binary  tournament  selection  is  used  to  select  the  new  set  of  offspring  from  the  mating 
pool  of  P+P’. 

4.  Two-point  crossover  and  bit  mutation  are  performed,  but  can  be  changed  by  the  user 
from  the  input  to  better  suit  a  given  problem. 

5.  New  population  and  old  non-dominated  population  are  returned  to  the  centralized  portion 
of  MOHO  where  the  new  P’  is  generated  and  clustering  is  performed,  if  needed. 

The  SPEA-2  algorithm  does  not  perform  the  merging  of  populations  and  non-dominating 
sets.  This  step  is  performed  by  MOHO,  external  to  the  search,  so  the  switching  algorithm  can 
monitor  the  progress  of  the  non-dominated  set. 

The  multi-objective  Particle  Swarm  algorithm  used  for  this  work  is  derived  from  the  original 
Particle  Swarm  Optimization  (PSO)  algorithm  developed  by  Eberhardt  et  al.  [7],  Some  Multi- 
Objective  Particle  Swarm  Optimization  (MOPSO)  algorithms  utilize  weighted  sums  of  PSO 
algorithms  [6],  While  this  type  of  application  of  PSO  is  effective  in  its  own  right,  an  optimization 
algorithm  solving  multi-objective  problems  in  a  Pareto  optimal  sense  was  needed.  To  modify 
PSO  for  multi-objective  optimization,  the  definition  of  personal  best  value  and  global  best  values 
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have  been  modified.  In  MOPSO,  the  personal  best  value  for  a  particle  is  the  non-dominated 
objective  in  the  particle’s  search  history.  The  global  best  value  is  the  member  of  the  current 
generation’s  non-dominated  set  that  is  closest  (in  objective  space)  to  the  particle  for  which  the 
velocity  is  being  calculated.  Other  than  the  method  for  choosing  the  global  and  personal  best 
for  each  particle,  the  rest  of  MOPSO  remains  true  to  the  original  PSO.  The  MOPSO  algorithm 
was  utilized  in  the  following  way  for  this  work: 

1.  A  population  P  and  a  non-dominated  set  P’  are  handed  to  the  algorithm  from  the 
centralized  part  of  the  MOHO  software. 

2.  A  velocity  vector  for  each  particle  is  calculated  using  the  technique  described  earlier 

3.  The  displacement  for  each  particle  is  calculated  using  the  equations  of  motion  and 
unit  time  step. 

4.  New  population  and  old  non-dominated  population  are  returned  to  the  centralized 
portion  of  MOHO  where  the  new  P’  is  generated  and  clustering  is  performed,  if 
needed. 

The  NSDE  created  for  this  work  is  a  low  level  combination  of  NSGA-II  [8]  and  differential 
evolution  (DE)  [9].  In  particular,  the  mutation  operator  from  DE  replaces  the  mutation  operator 
in  the  original  NSGA-II.  The  rest  of  the  algorithm  is  from  NSGA-II.  At  the  beginning  of  the 
algorithm,  the  last  population  and  the  non-dominated  population  are  used  to  play  the  roles  of  the 
offspring  and  parents  from  the  last  generation,  respectively.  At  this  point,  NSDE  continues  on 
like  NSGA-II  until  the  new  generation  is  created.  Then,  the  new  population  and  the  old  non- 
dominated  set  are  handed  back  to  the  switching  algorithm  in  MOHO. 

After  each  generation  is  created,  and  the  new  non-dominated  set  is  identified,  the  software 
assigns  the  new  non-dominated  set  a  score  from  zero  to  five.  If  the  new  non-dominated  set  gets 
a  score  of  two  or  better,  the  algorithm  that  generated  the  set  was  allowed  to  create  the  next 
generation  of  population  points.  An  algorithm  scores  a  point  for  each  of  five  possible 
improvements  that  are  achieved  from  one  generation  to  the  next.  Also,  if  an  algorithm  has  run 
consecutively  beyond  a  user  specified  iteration  (generation)  limit,  the  run  switches  to  the  next 
constituent  search  algorithm  determined  by  the  algorithm  pointer  array.  This  rule  was  used  to 
allow  all  constituent  search  algorithms  an  opportunity  to  improve  the  Pareto  approximation. 

The  switching  algorithm  compares  the  non-dominated  set  from  the  current  generation  to  the 
non-dominated  set  of  the  previous  generation.  The  comparison  process  consists  of  looking  at 
five  desired  improvements  to  the  Pareto  approximation.  The  improvements  are  actually  gains  in 
five  performance  criteria  (quality  factors).  More  than  one  quality  factor  was  used  because  of  the 
work  of  Zitzler  et  al.  [10,  11],  where  it  is  shown  that  as  opposed  to  the  situation  in  a  single 
objective  optimization,  one  quality  factor  alone  should  not  be  used  to  compare  two  Pareto 
approximations.  This  is  why  an  algorithm  must  score  a  minimum  of  two  in  order  to  continue 
creating  new  generations  of  population  points.  This  work  also  puts  forth  definitions  of 
compatibility  and  completeness  for  quality  factors.  After  the  improvement  metrics  for  the  MOHO 
algorithm  are  presented,  the  applicability  of  the  compatibility  and  completeness  presented  in 
Zitzler  et  al.  [10,  11]  will  be  discussed. 

The  main  part  of  MOHO  handles  combining  the  new  generation  with  the  most  recent  non- 
dominated  set  to  form  the  new  non-dominated  set  by  employing  objective  space  based 
clustering  as  needed  and  calculating  the  improvements.  The  clustering  is  employed  only  as  a 
non-dominated  set  trimming  tool  when  the  software  determines  that  the  non-dominated  set  will 
grow  beyond  the  user  defined  limit.  Five  improvements  were  developed  and  used  in  this  work 
because  there  is  a  concern  that  adding  too  many  Pareto  based  calculations  would  add 
considerable  overhead  to  the  software.  Now  that  the  algorithm  switching  logic  has  been 
discussed,  the  five  improvements  will  be  defined. 

Improvement  #1  -  Non-Dominated  Set  Size  Changes 
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For  this  improvement,  the  size  of  the  non-dominated  set  changes.  This  change  can  be  either 
the  non-dominated  set  getting  larger  or  smaller.  The  non-dominated  set  grows  in  size  when  a 
new  point  on  the  Pareto  approximation  front  is  discovered  and  the  old  non-dominated  set  is 
below  the  user  defined  non-dominated  set  size  limit.  Shrinking  of  the  non-dominated  set  occurs 
when  a  new  point  or  points,  dominate  larger  set  of  points  from  the  old  non-dominated  set.  This 
indicates  one  of  two  things:  a)  the  new  point(s)  significantly  redefine  the  geometry  of  the  non- 
dominated  set,  or  b)  clustering  has  yet  to  be  employed  on  the  Pareto  approximation  and 
multiple  points  were  clustered  around  each  other. 

Improvement  #2  -  A  Point  from  the  New  Generation  Dominates 

This  improvement  is  satisfied  if  any  population  member  of  the  new  generation  dominates 
any  member  in  the  last  generation’s  non-dominated  set.  Any  opportunity  to  improve  the  Pareto 
approximation  by  removing  a  dominated  point  is  considered  an  improvement.  This  can  be 
expressed  by: 

/')  Set  A  =  False 

/'/')  Let  m  =  Sizeof(Pop)  and  n=  Sizeof(NonDom) 

Hi)  (POP*  f  NonDorr\  ?  True:  False)v,4;/  =  1  1  JL,n 

iv)  2nd  Improvement^  A 

Improvement  #3  -  Change  in  the  Dominated  Hyper  Volume 

The  hyper  volume  quality  factor  has  its  roots  in  the  hyper  volume  calculation  presented  by 
Deb  [12].  When  the  first  population  generation  is  created  using  Sobol’s  quasi  random  sequence 
generator  [13,  14],  the  worst  objective  value  for  each  objective  from  the  entire  population  is 
collected  into  a  worst  case  objective  vector.  This  worst  case  objective  vector  is  used  as  the 
common  diagonal  for  all  hyper  volume  calculations  in  the  search;  a  static  datum  for  the  entire 
run.  The  improvement  is  considered  fulfilled  when  the  new  generation’s  contribution  to  the  non- 
dominated  set  causes  a  change  in  the  dominated  hyper  volume  for  the  non-dominated  set. 

Improvement  #4  -  Average  Distance  Change 

In  this  calculation  the  average  Euclidian  distance  of  all  the  objective  vectors  of  the  new 
generation’s  non-dominated  set  is  calculated.  This  is  basically  the  magnitude  of  the  objective 
vectors,  because  the  datum  is  the  origin  of  the  objective  space.  If  the  measure  changes  from 
the  value  of  the  old  generation,  this  improvement  has  been  met.  This  improvement  tries  to 
capture  changes  in  the  geometry  of  the  non-dominated  set. 

Improvement  #5  -  Spread  of  the  Don-dominated  Set 

The  equation  for  the  spread  is  found  in  a  book  by  Deb  [12]  and  its  origins  are  attributed  to 
Zitzler  This  improvement  is  fulfilled  if  the  new  generation’s  non-dominated  set  increases  the 
spread  over  that  of  the  last  generation.  The  equation  for  the  spread  is  as  follows: 


(2) 


Improvements  #3  and  #5  are  common  metrics  for  trying  to  determine  if  one  Pareto 
approximation  set  is  better  than  another.  The  other  improvements  are  not  common,  since  it 
would  be  useless  to  compare  final  Pareto  results  from  two  different  multi-objective  algorithms. 
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If  the  search  algorithm  in  question  is  not  able  to  achieve  at  least  two  improvements,  or  it  has 
consecutively  run  for  the  user  specified  limiting  number  of  iterations,  the  latest  population  and 
non-dominated  set  are  passed  to  the  next  search  algorithm.  MOHO  runs  until  the  maximum 
number  of  function  evaluations  is  performed.  All  optimization  run  parameters  are  specified  by 
the  user  in  an  external  (to  the  software)  input  file.  MOHO  was  tested  in  order  to  evaluate  its 
capability  by  running  it  on  test  problems  from  three  previously  published  works.  The  first  work 
examined  is  the  well  known  multi-objective  optimization  comparison  paper  by  Zitzler,  Thiele  and 
Deb  [10]  (referred  to  as  ZDT  from  this  point  forward).  A  copy  of  the  original  data  from  the  ZDT 
paper  can  be  found  oh  the  website  cited  in  that  paper.  In  the  present  work,  we  will  present  this 
data  again  and  inject  the  results  from  our  MOHO  into  this  original  comparison.  The  second  and 
third  works  that  MOHO  will  be  compared  with,  are  related  to  each  other.  In  the  original  NSGA-II 
paper,  Deb  et  al.  [8,  10]  revisit  some  of  the  ZDT  problems,  present  results  for  some  other 
unconstrained  problems,  and  use  NSGA-II  to  solve  some  constrained  problems.  In  the  present 
work,  our  MOHO  will  was  used  to  solve  the  unconstrained  problems  of  the  original  NSGA-II 
paper.  Results  from  that  paper  will  be  compared  to  MOHO  results.  Finally,  the  paper  by  Vrugt 
and  Robinson  compares  their  AMALGAM  [15]  algorithm  to  the  performance  of  NSGA-II  for 
some  unconstrained  multi-objective  optimization  test  problems.  These  results  will  also  be 
included  in  this  work  and  compared  to  results  from  MOHO  using  the  same  experimental 
conditions. 

In  the  ZDT  work,  the  Strength  Pareto  Evolutionary  Algorithm  (SPEA),  Non-Sorting  Genetic 
Algorithm  (NSGA),  Vector  Evaluated  Genetic  Algorithm  (VEGA),  Niched  Pareto  Genetic 
Algorithm  (NPGA),  Hajela  and  Lin’s  evolutionary  algorithm  (HLGA),  Fonseca  and  Fleming’s 
evolutionary  algorithm  (FFGA),  and  a  Single  Objective  Evolutionary  Algorithm  (SOEA)  were 
studied.  The  algorithms  were  applied  to  six  multi-objective  problems  designed  by  those  authors 
specifically  for  that  work.  All  test  problems  represent  two-objective  optimizations  where  both 
objectives  are  to  be  minimized.  All  seven  algorithms  were  compared  in  two  ways. 

The  optimizer  utilizes  several  multi-objective  evolutionary  optimization  algorithms  and 
orchestrates  the  application  of  these  algorithms  to  multi-objective  optimization  problems,  using 
an  automatic  internal  switching  algorithm.  The  switching  algorithm  is  designed  to  favor  those 
search  algorithms  that  quickly  improve  the  Pareto  approximation  and  grades  improvements 
using  five  criteria.  A  thorough  testing  of  reliability  and  accuracy  of  MOHO  against  a  number  of 
prominent  multi-objective  optimization  algorithms  and  one  hybrid  optimizer  confirmed  that 
MOHO  performs  reliably  and  accurately. 

For  the  first  results  comparing  MOHO  to  the  evolutionary  multi-objective  optimization 
algorithms  from  the  ZDT  work  [10,  11,  12],  it  has  been  shown  that  MOHO  can  outperform  the 
classic  evolutionary  algorithms  for  all  test  problems  except  ZDT5.  Difficulties  surrounding  ZDT5 
test  case  are  not  trivial  as  was  confirmed  by  Deb  et  al.  [12]  who  tested  all  the  ZDT  test 
problems,  except  ZDT5. 

First,  a  plot  of  the  non-dominated  sets  for  each  algorithm  was  made.  To  allow  for 
performance  fluctuations  caused  by  the  random  number  generators  driving  the  initial  population 
and  genetic  operators,  all  algorithms  were  run  30  times  on  each  of  the  six  problems.  The  non- 
dominated  plot  is  generated  by  making  a  union  of  the  non-dominated  sets  for  the  first  five  runs, 
of  each  algorithm,  on  each  problem.  The  non-dominated  set  of  the  unions  is  then  plotted.  In  the 
figures  presented  here,  the  results  for  MOHO  are  injected  into  the  plot  results  from  ZDT  and  the 
random  number  data  from  the  original  plots  is  removed  to  provide  a  clearer  view  of  the 
performances  of  the  search  algorithms.  Figures  1  through  6  are  the  plots  for  ZDT  test  problems 
1  through  6,  respectively,  using  the  original  ZDT  paper  data  and  MOHO  data  run  for  this  work. 

From  these  figures  it  is  evident  that  our  MOHO  algorithm  performs  very  well  on  almost  all  of 
the  ZDT  test  cases  as  far  as  accuracy  is  concerned.  Besides  being  an  accurate  evolutionary 
search  algorithm,  these  figures  demonstrate  that  MOHO  is  also  a  reliable  algorithm  that 
consistently  gives  good  results. 
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The  second  comparison  mechanism  is  the  “cover”  function  proposed  in  the  ZDT  work.  The 
cover  function  evaluates  what  fraction  of  the  non-dominated  set  of  algorithm  A  is  either  equal  to 
or  weakly  dominates  the  non-dominated  set  of  algorithm  B.  The  formula  for  the  cover  functions, 
as  shown  in  the  ZDT  work,  is: 

|{a"e  B;3a'e  A  :  a'p  a"]| 

C(A3):=n - jif - —  (3) 

Of  note  is  the  fact  that  C(A,B)  is  not  necessarily  equal  to  C(B,A). 

Figure  7  shows  the  results  for  the  cover  function  analysis  for  all  test  problems  and 
optimization  algorithms  compared  [3].  In  this  analysis,  all  30  runs  of  each  algorithm  are  utilized. 
Each  cell  in  Figure  7  is  made  by  treating  the  algorithms  in  the  row  as  algorithm  A  in  equation  3. 
Each  of  the  30  Pareto  approximations  for  algorithm  A  is  compared  to  the  algorithm  B  (column). 
So,  for  each  Pareto  approximation  of  algorithm  A,  comparisons  with  30  algorithm  B  Pareto  sets 
are  performed..  This  means  that  30  cover  function  values  are  generated.  Each  column  in  the 
box  plot  (Figure  7)  represents  the  results  for  one  of  the  ZDT  problems.  So,  each  column  in  the 
box  plot  represents  a  total  of  900  Pareto  approximation  set  comparisons. 
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fig.  7  Cover  function  analysis  front  ZOT  with  MOHO  data  inverted 
into  the  original  analysis.  As  in  the  original  analysis  the  value  at  the 
I  nit  timi  of  tin-graph  isO  and  the  valneis  I  at  the  top.  The  hlack  lx>\es  in 
the  hox  plots  show  the  average  cover  value  for  all  .VI  runs,  liie  shaded 
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Using  the  data  generated  from  solving  the  ZDT  test  problems  it  was  possible  to  perform 
another  analysis  of  the  MOHO  algorithm's  performance.  Figure  8  shows  the  average  percent  of 
total  number  of  function  evaluations  used  by  each  of  the  three  constituent  search  algorithms  in 
MOHO  when  applied  to  a  given  ZDT  test  problem.  Figure  8  takes  into  account  all  30  runs  used 
for  the  ZDT  data  comparison. 
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fig.  8  \veragc  percent  of  total  execution  time  that  each  constituent 
optimization  algorithm  was  n%d  in  MOHO  for  each  of  the  six  ZDT  test 
problems,  flic  ascrage  is  taken  oser  30  runs  used  in  the  71)1 
comparison. 


Two-Objective  Hybrid  Optimizer  with  a  Response  Surface 

Besides  MOHO  algorithm,  we  have  also  developed  an  entirely  different  multi-objective  hybrid 
optimization  algorithm  that  currently  works  only  for  two-objective  problems  [16-19],  The  general 
schematic  of  this  hybrid  optimizer  is  given  in  Figure  9  and  Figure  10. 


Figure  9  -  Global  procedure  for  the  hybrid  optimization  method  HI 
The  method  starts  with  a  randomly  generated  population  matrix  P  in  the  domain  of  interest. 
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Thus,  successive  combinations  of  chromosomes  and  mutations  are  performed,  creating  new 
generations  until  an  optimum  value  is  found.  The  iterative  process  is  given  by 

*,w=^x*+^[«  +  F(P-t)]  (4) 


where 

Xi  is  the  i- th  individual  of  the  vector  of  parameters. 

a,  P  and  yare  three  members  of  population  matrix  P,  randomly  chosen. 

F  is  a  weight  function,  which  defines  the  mutation  (0.5  <  F  <  1). 
k  is  a  counter  for  the  generations. 

S1  and  <5^  are  delta  Dirac  functions  that  define  the  mutation 

In  this  minimization  process,  if  f(xk+1)  <  f(xk),  then  xk+1  replaces  xk  in  the  population  matrix  P 
Otherwise,  xk  is  kept  in  the  population  matrix. 

The  binomial  crossover  is  given  as 


S ;  =0,  if  R < CR 

(5) 

1,  if  R>CR 

where  CR  is  a  factor  that  defines  the  crossover  (0.5  <  CR  <  1)  and  R  is  a  random  number  with 
uniform  distribution  between  0  and  1. 

In  the  hybrid  optimizer  HI,  when  a  certain  percent  of  the  particles  find  a  minimum,  the 
algorithm  switches  automatically  to  the  differential  evolution  method  and  the  particles  are  forced 
to  breed.  If  there  is  an  improvement  in  the  objective  function,  the  algorithm  returns  to  the  particle 
swarm  method,  meaning  that  some  other  region  is  more  prone  to  having  a  global  minimum.  If 
there  is  no  improvement  on  the  objective  function,  this  can  indicate  that  this  region  already 
contains  the  global  value  expected  and  the  algorithm  automatically  switches  to  the  BFGS 
method  in  order  to  find  its  location  more  precisely.  In  Figure  9,  the  algorithm  returns  to  the 
particle  swarm  method  in  order  to  check  if  there  are  no  changes  in  this  location  and  the  entire 
procedure  repeats  itself.  After  some  maximum  number  of  iterations  is  performed  (e.g.,  five)  the 
process  stops. 

The  hybrid  optimizer  H2  [19]  is  quite  similar  to  the  HI,  except  by  the  fact  that  is  uses  a 
response  surface  method  at  some  point  of  the  optimization  task.  The  global  procedure  is 
illustrated  in  Fig.  10.  It  can  be  seen  from  this  figure  that  after  a  certain  number  of  objective 
functions  were  calculated,  all  this  information  was  used  to  obtain  a  response  surface.  Such  a 
response  surface  is  then  optimized  using  the  same  proposed  hybrid  code  defined  in  the  HI 
optimizer  so  that  it  fits  the  calculated  values  of  the  objective  function  as  closely  as  possible. 
New  values  of  the  objective  function  are  then  obtained  very  cheaply  by  interpolating  their  values 
from  the  response  surface. 

In  Figure  10,  if  the  BFGS  cannot  find  any  better  solution,  the  algorithm  uses  a  radial  basis 
function  interpolation  scheme  to  obtain  a  response  surface  and  then  optimizes  such  response 
surface  using  the  same  hybrid  algorithm  proposed.  When  the  minimum  value  of  this  response 
surface  is  found,  the  algorithm  checks  to  see  if  it  is  also  a  solution  of  the  original  problem.  Then, 
if  there  is  no  improvement  of  the  objective  function,  the  entire  population  is  eliminated  and  a 
new  population  is  generated  around  the  best  value  obtained  so  far.  The  algorithm  returns  to  the 
particle  swarm  method  in  order  to  check  if  there  are  no  changes  in  this  location  and  the  entire 
procedure  repeats  itself.  After  a  specified  maximum  number  of  iterations  is  performed  (e.g.,  five) 
the  process  stops. 
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Figure  10  -  Global  procedure  for  the  hybrid  optimization  method  H2 


The  new  algorithm,  which  will  be  called  H3,  is  an  extension  of  the  previous  ones.  The  global 
procedure  is  outlined  below: 

1.  Generate  an  initial  population,  using  the  real  function  (not  the  interpolated  one)  f(x). 
Call  this  population  Preai. 

2.  Determine  the  individual  that  has  the  minimum  value  of  the  objective  function,  over 
the  entire  population  Preai  and  call  this  individual  x^. 

3.  Determine  the  individual  that  is  more  distant  from  the  xbest,  over  the  entire  population 
Preai-  Call  this  individual  xfar. 

4.  Generate  a  response  surface,  with  the  methodology  at  section  2,  using  the  entire 
population  Preai  as  training  points.  Call  this  function  g(x). 

5.  Optimize  the  interpolated  function  g(x)  using  the  hybrid  optimizer  HI,  defined  above, 
and  call  the  optimum  variable  of  the  interpolated  function  as  xint.  During  the 
generation  of  the  internal  population  to  be  used  in  the  HI  optimizer,  consider  the 
upper  and  lower  bounds  limits  as  the  minimum  and  maximum  values  of  the 
population  Preai  in  order  to  not  extrapolate  the  response  surface. 

6.  If  the  real  objective  function  /(x,nt)  is  better  than  all  objective  function  of  the  population 
Preai-  replace  xfar  by  xint.  Else,  generate  a  new  individual,  using  the  Sobol  pseudo¬ 
random  sequence  generator  within  the  upper  and  lower  bounds  of  the  variables,  and 
replace  xfar  by  this  new  individual. 

7.  If  the  optimum  is  achieved,  stop  the  procedure.  Else,  return  to  step  2. 

From  the  sequence  above,  one  can  notice  that  the  number  of  times  that  the  real  objective 
function  f(x)  is  called  is  very  small.  Also,  from  step  6,  one  can  see  that  the  space  of  search  is 
reduced  at  each  iteration.  When  the  response  surface  g(x)  is  no  longer  capable  to  find  a 
minimum,  a  new  call  to  the  real  function  f(x)  is  made  to  generate  a  new  point  to  be  included  in 
the  interpolation.  Since  the  CPU  time  to  calculate  the  interpolated  function  is  very  short,  the 
maximum  number  of  iterations  of  the  HI  optimizer  can  be  very  large  (e.g.,  1000  iterations). 
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The  hybrid  optimizer  H3  was  compared  against  the  optimizer  HI,  H2  and  the  commercial 
code  IOSO  2.0  for  some  standard  test  functions.  The  first  test  function  was  the  Levy  #9  function 
[20],  which  has  625  local  minima  and  4  variables.  Such  function  is  defined  as 

/(x)  =  sin:(^-z,)  +  ^(zj-l)2'  l  +  10sin:(/rz,+l)  +(z4-l)2  (6) 


where 

z,  =  l  +  ^p(/  =  l,4)  (7) 

The  function  is  defined  within  the  interval  -10  <  x  <  10  and  its  minimum  is  f(x)  =  0  for  x  =  1. 
Figure  11  shows  the  optimization  history  of  the  IOSO,  HI,  H2  and  H3  optimizers.  Since  the  HI, 
H2  and  H3  optimizers  are  based  on  random  number  generators  (because  of  the  Particle  Swarm 
module),  we  present  the  best  and  worst  estimates  for  these  three  optimizers. 

From  Fig.  11,  it  can  be  seen  that  the  performance  of  the  H3  optimizer  is  very  close  to  the 
IOSO  commercial  code.  The  HI  code  is  the  worst  and  the  H2  optimizer  also  has  a  reasonable 
good  performance.  It  is  interesting  to  note  that  the  HI  code  is  the  only  one  that  doesn’t  have  a 
response  surface  model  implemented. 


Figure  1 1  -  Optimization  history  of  the  Levy  #9  function  for  the  (a)IOSO,  (b)HI-best,  (c )H2- 
best,  ( d)H3-best ,  (e)H1-worst,  (t)H2-worst  and  (g)H 3-worst  optimizers 

The  second  function  tested  was  the  Griewank  function  [20],  defined  as 


/(x)=p — n 

V  ’  4000  if 

jc(.  g  ]  -600, 600  [ 


cos 


t)+1 

vvr; 

,0  =  1,2) 


(8) 
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The  global  minima  for  this  function  is  located  at  x  =  0  and  is  f(x)  =  0.  The  function  has  an 
incredible  number  of  local  minima,  making  the  optimization  task  quite  difficult. 

Figure  12  shows  the  optimization  history  of  the  IOSO  commercially  available  Russian 
optimization  code  IOSO  [21],  HI,  H2  and  H3  optimizers.  Again,  the  best  and  worst  results  for 
HI,  H2  and  H3  are  presented.  From  this  figure,  it  is  clear  that  the  HI,  H2  and  H3  optimizers  are 
much  better  than  the  IOSO  commercial  code.  The  HI  code  was  the  best,  while  the  H2 
sometimes  stopped  at  some  local  minima.  The  worst  result  of  the  H3  optimizer  was,  however, 
better  than  the  result  obtained  by  IOSO.  It  is  worth  to  notice  that  the,  with  more  iterations,  the 
H3  code  could  reach  the  minimum  of  the  objective  function,  even  for  the  worst  result. 


Figure  12  -  Optimization  history  of  the  Griewank  function  for  the  (a)IOSO,  ( b)H1-best ,  (c )H2- 
best,  (d )H3-best,  (e)H1-worst,  (i)H2-worst  and  (g)H3-worst  optimizers 

The  next  test  function  implemented  was  the  Rosenbrook  function  [22],  which  is  defined  as 

/(x,,x2)  =  100(x2  -x] )2  +  (l-x,  )2  (9) 


The  function  is  defined  within  the  interval  -10  <  x  <  10  and  its  minimum  is  f(x)  =  0  for  x  =  1. 
Figure  13  shows  the  optimization  history  of  the  IOSO,  HI,  H2  and  H3  optimizers. 

For  this  test  function,  which  is  almost  flat  close  to  the  global  minima,  the  IOSO  code  was  the 
one  with  the  best  performance,  followed  by  the  H3  optimizer.  The  H2  did  not  perform  well  and 
the  HI  was  able  to  get  close  to  the  minimum,  but  with  a  huge  number  of  objective  function 
calculations.  When  looking  at  the  H3  results,  the  final  value  of  the  objective  function  differs  by 
some  orders  of  magnitude.  However,  the  optimum  solution  obtained  with  this  new  optimizer  was 
Xi  =  0.9996  and  x2  =  0.9992,  while  the  IOSO  obtained  x,  =  1.0000  and  x2  =  1.0000.  Thus  the 
relative  error  among  the  variables  was  less  than  0.01%,  indicating  that  despite  of  the 
discrepancy  among  the  final  value  of  the  objective  function,  the  H3  code  was  able  to  recover  the 
value  of  the  optimum  variables  with  a  negligible  relative  error. 
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Figure  13  -  Optimization  history  of  the  Rosenbrook  function  for  the  (a)IOSO,  (b)H1-best,  ( c)H2 • 
best,  ( d)H3-best ,  (e)H1-worst,  (1)H2-worst  and  (g)H 3-worst  optimizers 


The  last  test  function  analyzed  was  the  Mielle-Cantrel  function  [23],  which  is  defined  as 


/(*)  = 


exp 


+  100(x,  -x3)6  +arctan4  (x3  -x4)  +  x,2 


(10) 


The  function  is  defined  within  the  interval  -10  <  x  <  10  and  its  minimum  is  f(x)  =  0  for  =  0 
and  x2  =  x3  =  x4  =  1.  Figure  14  shows  the  optimization  history  of  the  IOSO,  HI,  H2  and  H3 
optimizers.  Again,  the  best  and  worst  results  for  HI,  H2  and  H3  are  presented. 


Figure  14  -  Optimization  history  of  the  Griewank  function  for  the  (a)IOSO,  (b)HI-best,  (c )H2- 
best,  (d )H3-best,  ( e)H1-worst ,  (f )H2-worst  and  (g )H3-worst  optimizers 
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For  this  function,  the  IOSO  code  was  the  best,  followed  by  the  H3.  The  H2  code  performed 
very  bad  again  the  the  HI  was  able  to  get  to  the  global  mininum  after  a  huge  numbe  of  objective 
function  calculations.  As  occurred  with  the  Rosenbrook  function,  in  spite  of  the  H3  result  for  the 
objective  function  differ  from  the  IOSO  code,  the  final  values  of  the  variables  were 
Xi=4.0981x10'8,  x2=0.9864,  x3=0.9688  and  x4=0.9626  for  the  H3  optimizer  and  x^-0. 1216x10  °, 
x2=1 .002,  x3=0.9957  and  x4=0.9962  for  the  IOSO  code. 


ALGORITHM  DEVELOPMENTS  FOR  RESPONSE  SURFACES 

Three  distinctly  different  algorithms  have  been  developed  for  automatic  generation  of  multi¬ 
dimensional  response  surfaces  (metamodels)  supported  by  a  small  number  of  high  fidelity 
values  of  the  objective  functions.  These  three  methods  were  based,  respectively,  on: 

(1)  Wavelet  based  neural  networks  [24] 

(2)  Polynomial  radial  basis  functions  [25-28] 

(3)  Self-organizing  maps  and  graph  theory  [35] 

Response  Surfaces  using  Wavelet-Based  Neural  Networks  (WNN) 

This  is  an  alternative  technique  to  search  the  proper  activation  functions  for  the  construction 
of  WNN  that  helps  to  build  response  surface  for  predicting  multi-dimensional  function  spaces 
accurately  and  efficiently.  Several  modifications  to  the  architecture  and  basis  function  in  WNN 
were  suggested  and  tested  using  diverse  test  functions.  Simulation  results  show  that  WNN  for 
fitting  multi-dimensional  surfaces  having  moderate  number  of  dimensions  (<35)  can  be 
implemented  accurately.  The  modified  WNN  developed  in  such  a  way  had  about  5  -  7  % 
average  absolute  error  on  training  data  in  most  of  the  test  problems.  The  final  testing  for  most  of 
the  test  functions  was  done  using  1000  -  1200  data  points.  It  was  found  that  about  ten  to  twelve 
activation  nodes  in  the  hidden  layer  of  WNN  were  adequate  for  good  predictions,  that  is,  5% 
average  absolute  error  on  training  data.  Further  addition  of  activation  nodes  in  WNN  improved 
the  accuracy  on  training  data  only  slightly,  but  did  not  help  in  improving  the  accuracy  on  testing 
data  further  [24],  This  implies  that  for  the  given  dataset,  a  WNN  with  about  ten  to  twelve 
activation  nodes  extracted  most  of  the  information  regarding  the  topology  of  the  functional 
space.  Finally,  a  hybrid  WNN  network  was  developed  using  the  concept  of  having  multiple 
WNNs  and  helped  in  increasing  the  accuracy  of  prediction  of  highly  non-linear  functions.  From 
Figure  15  it  appears  that  WNN  is  appropriate  for  low-dimensional  response  surfaces. 


□  RSQ  ■  CPU  Time 


u  of  vanables 


Figure  15  -  R2  accuracy  results  and  CPU  time  for  a  WNN  with  one  subnet. 
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Response  Surfaces  using  Polynomials  of  Radial  Basis  Functions 

Let  us  suppose  that  we  have  a  function  of  L  variables  x„  /  =  1,...,  L.  The  RBF  model  used  in 
this  work  has  the  following  form  [25-28] 

N  ML 

s  ( x) = /  ( x) = S  aA  lx  -  x, ■  |) + £  S  't*p*  (*« ) + A  (1 1 ) 

i= 1  *= I  1=1 

where  x={x,,...,x( . xL)  and  f(x)  is  known  for  a  series  of  points  x  .  Here,  pfc(x,-)  is  one  of  the  M 

terms  of  a  given  basis  of  polynomials  [29].  This  approximation  is  solved  for  the  q  and  /?,* 
unknowns  from  the  system  of  N  linear  equations,  subject 

N 

7=1 

M 

N 

ZOryA(^i)  =  0 

7=1 

N 

i>,=° 

J-l 

In  this  work,  the  polynomial  part  of  Eq.  (1 1 )  was  taken  as 

Pkixi)  =  xi 

and  the  radial  basis  functions  are  selected  among  the  following  [30] 


Multiquadrics:  0(jx(  -x,  J  =  yj^xl  -x] )  +cy2 
Gaussian:  <p{\xj  -x7|]  =  expj"-c"  (x,.  -x;  ) 

Squared  multiquadrics:  <p[\xi -xr.|)  =  (x(.  —Xj )  +c2 

\j(x,  ~xj):  +cj 

with  the  shape  parameter  c;  kept  constant  as  1/A/.  The  shape  parameter  is  used  to  control  the 
smoothness  of  the  RBF. 

From  Eq.  11  one  can  notice  that  a  polynomial  of  order  M  is  added  to  the  radial  basis 
function.  M  was  limited  to  an  upper  value  of  6.  After  inspecting  Eqs.  (11-14),  one  can  easily 
check  that  the  final  linear  system  has  [(N+M*L)+ 1]  equations.  Some  tests  were  made  using  the 
cross-product  polynomials  (x,X/X*...),  but  the  improvements  on  the  results  were  irrelevant.  Also, 
other  types  of  RBFs  were  used,  but  no  improvement  on  the  interpolation  was  observed. 

To  provide  a  more  complete  picture  of  metamodel  accuracy,  three  different  metrics  are  used: 
R  Square,  Relative  Average  Absolute  Error  (RAAE),  and  Relative  Maximum  Absolute  Error 
(RMAE)  [31]. 


Cubical  multiquadrics:  ^(jx,  -x;|j  = 


(12) 

(13) 

(14) 

(15) 

(16) 

(17) 

(18) 
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a)  R  Square  (R2) 


R2  =1- 


'Z(yi-yif 

;=1 _ 

tiy-yf 


MSE 

variance 


(19) 


where  y  is  the  corresponding  predicted  value  for  the  observed  value  y,;  y  is  the  mean  of  the 

observed  values.  While  MSE  (Mean  Square  Error)  represents  the  departure  of  the  metamodel 
from  the  real  simulation  model,  the  variance  captures  how  irregular  the  problem  is.  The  larger 
the  value  of  R2,  the  more  accurate  the  metamodel. 

b)  Relative  Average  Absolute  Error  (RAAE) 

Y\y.-y\ 

RAAE  =  -  (20) 

n*STD 

where  STD  stands  for  standard  deviation.  The  smaller  the  value  of  RAAE,  the  more  accurate 
the  metamodel. 


c)  Relative  Maximum  Absolute  Error  (RMAE) 

RMAE  =  max(|^i-^l|.|^2-j>2l.-.k-^|) 

STD 


(21) 


Large  RMAE  indicates  large  error  in  one  region  of  the  design  space  even  though  the  overall 
accuracy  indicated  by  R2  and  RAAE  can  be  very  good  Therefore,  a  small  RMAE  is  preferred 
However,  since  this  metric  cannot  show  the  overall  performance  in  the  design  space,  it  is  not  as 
important  as  R2  and  RAAE. 

Although  the  R2,  RAAE  and  RMAE  are  useful  to  ascertain  the  accuracy  of  the  interpolation, 
they  can  fail  in  some  cases.  For  the  R2  metric,  for  example,  if  one  of  the  testing  points  has  a 
huge  deviation  of  the  exact  value,  such  discrepancy  might  affect  the  entire  sum  appearing  on 
Eq.  (19)  and,  even  if  all  the  other  testing  points  are  accurately  interpolated.  Similarly,  the  R2 
result  can  be  very  bad.  For  this  reason,  we  also  calculate  the  percentage  deviation  of  the  exact 
value  of  each  testing  point.  Such  deviations  are  collected  according  to  6  ranges  of  errors:  0- 
10%;  10-20%;  20-50%;  50-100%;  100-200%;  >200%.  Thus,  a  interpolation  that  has  all  testing 
points  within  the  interval  of  0  to  10%  of  relative  error  might  be  considered  good  in  comparison  to 
another  one  where  the  points  are  all  spread  along  the  intervals  from  10%  to  200%. 

The  procedure  was  shown  to  work  for  fitting  highly  non-linear  functions  where  large  number 
of  variables  were  involved.  The  RBF  technique  is  quite  powerful  regarding  its  accuracy  and 
reduced  CPU  time.  Even  when  the  number  of  variables  were  as  large  as  500,  the  RBF 
approximation  was  very  fast  and  robust.  This  is  a  promising  technique  for  real  time 
interpolations  such  as  target  tracking  or  image  recognition.  Some  comparisons  were  made  with 
the  Wavelets  Neural  Network  showing  a  general  superior  behavior  of  the  RBF.  A  new  hybrid 
optimizer  based  on  the  RBF  interpolation  was  also  presented,  which  is  much  more  efficient  than 
our  previous  ones.  In  fact,  its  performance  is  very  close  to  IOSO  software  [21  ]  which  is  one  of 
the  best  commercial  optimizers  available. 

Although  the  R2,  RAAE  and  RMAE  are  useful  to  ascertain  the  accuracy  of  the  interpolation, 
they  can  fail  in  some  cases.  For  the  R2  metric,  for  example,  if  one  of  the  testing  points  has  a 
huge  deviation  from  the  exact  value,  such  discrepancy  might  affect  the  entire  sum  appearing  in 
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Eq.  (19)  and,  even  if  all  the  other  testing  points  are  accurately  interpolated,  the  R2  result  can  be 
very  bad.  For  this  reason,  we  also  calculate  the  percentage  deviation  from  the  exact  value  of 
each  testing  point. 

In  order  to  test  the  accuracy  of  the  RBF  models  proposed,  295  test  cases  were  used, 
representing  linear  and  non-linear  problems  with  up  to  100  variables.  Such  problems  were 
selected  from  a  collection  of  395  problems  (actually  295  documented  test  cases),  created  by 
Hock  and  Schittkowski  [32]  and  Schittkowski  [33].  Figure  16  shows  the  number  of  variables  of 
each  one  of  the  test  cases.  Note  that  there  are  395  test  cases,  but  some  of  them  were  not  used 
because  test  cases  ranging  from  120  to  200  were  not  defined  in  the  original  publications. 


Test  problem 

Figure  16.  Number  of  variables  for  each  of  the  295  test  cases  considered  for  RBF  fit. 

In  order  to  verify  the  accuracy  of  the  interpolation  over  different  number  of  training  points, 
three  sets  were  defined;  scarce,  small  and  medium.  Also,  the  number  of  corresponding  testing 
points  varied  according  to  the  number  of  training  points.  Table  1  presents  these  three  sets, 
based  on  the  number  of  dimensions  (variables)  L  of  the  particular  problem. 


Table  1  Number  of  training  and  testing  points  where  L  is  the  number  of  variables 


Number  of  training  points 

Number  of  testing  points 

Scarce  set 

3  L 

300  L 

Small  set 

10  L 

1000  L 

Medium  set 

50  L 

10000 L 

For  example,  if  we  are  trying  to  fit  a  function  of  say,  500  variables,  then  a  scarce  set  will 
have  3*500  =  1500  training  points  and  300*500  =  150,000  testing  points.  The  large  matrix 
resulting  from  Eq.  (11)  assuming  that  we  use  polynomial  of  order  M  =  6,  will  be  of  size  (6*500  + 
1,500  +  1  =  4501)  squared.  Such  large  resulting  matrices  have  been  solved  iteratively  using  bi¬ 
conjugate  iterative  method.  Cross-validation  was  performed  for  each  combination  of  these,  say, 
4501  x  4501  coefficients. 

Figure  17  shows  the  R2  metric  for  all  test  cases,  using  the  scarce  set  (3*L)  of  training  points 
only  for  the  interpolation  functions  that  presented  some  meaningful  results.  It  can  be  noticed 
that  the  results  are  all  spread  from  0  (completely  inadequate  interpolation)  to  1  (very  accurate 
multi-dimensional  interpolation).  Qualitatively,  the  darkest  pictures  mean  better  results.  The 
fittest  polynomial  RBF  method  provides  the  best  interpolation  among  all  test  functions.  The 
RBF1,  RBF4  and  RBF7  based  methods  also  performed  a  good  interpolation,  based  on  the  R2 
metric. 
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Figure  17.  R2  metric  for  various  fitting  method  using  scarce  set  (3*L)  of  training  points. 


Figure  18  shows  the  computing  time  required  to  interpolate  each  test  function,  using  the 
scarce  set  (3*L)  of  training  points  for  the  fittest  polynomial  RBF  method.  For  most  of  the  cases, 
the  computing  time  was  less  than  10  seconds,  using  a  Pentium  IV  3  GHz  with  1Gb  RAM 
running  Rocks  4.2.1  (Cydonia)  and  the  Intel©  ifort  (IFORT)  9.1  20060323  compiler  with  the 
directives  “-static-libcxa  -static  -03  -r8”.  In  fact,  the  highest  dimensional  test  case,  which  has 
100  variables  required  only  50  seconds  to  be  interpolated. 
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Test  problem  (scarce  set) 

Figure  18.  Computing  time  required  for  the  scarce  set  (3*L)  of  training  points  using  the  fittest 

polynomial  RBF  method. 
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Response  Surfaces  Using  a  Hybridized  Multi-Layer  Self-Organizing 

Self-organizing  algorithms  come  from  the  field  of  cybernetics  [34],  The  idea  is  that  this 
algorithm  “learns”  the  black  box  model  it  is  trying  to  mimic  and  makes  the  surrogate  model  only 
as  complex  as  is  required.  By  allowing  the  program  to  gradually  complicate  the  final  model,  the 
construction  and  evaluation  time  of  surrogate  model  is  automatically  optimized  for  a  given  task. 
In  this  multi-layer  self-organizing  algorithm  very  simple  polynomial  basis  functions  are  used  to 
generate  models  describing  highly  non-linear  multi-variable  functions. 

Based  on  these  concepts,  we  have  developed  and  thoroughly  tested  a  hybridized  multi-layer 
self-organizing  algorithm  [35].  In  this  method,  the  algorithm  can  choose  the  basis  functions  that 
locally  capture  the  interaction  between  the  variables  in  the  most  accurate  fashion.  Two 
hybridized  methods  will  be  presented  [35].  In  the  first  method,  the  hybridized  algorithm  will  be 
able  to  choose  from  linear,  quadratic,  cubic  and  quartic  basis  polynomials,  as  needed.  In  the 
second  hybridized  algorithm,  the  same  basis  polynomial  options  from  the  first  hybrid  method  are 
given  to  the  algorithm  plus  the  capability  to  choose  a  simple  RBF  as  a  basis  function  to  describe 
interactions  among  the  variables.  These  methods  will  be  compared  to  the  single  basis  function 
response  surface  models  using  the  same  test  problems. 

This  multi-layer  self-organizing  algorithms  with  different  order  polynomial  basis  functions  will 
be  compared  using  Schittkowski's  suite  of  295  non-linear  optimization  test  cases  [32,  33]. 

The  self-organizing  algorithm  used  in  this  work  is  the  multilayer  algorithm.  In  the  multilayer 
algorithm  the  design  variables  are  permutated,  in  pairs,  to  form  nodes.  At  each  node  a  least 
squares  regression  is  performed  using  the  two  variables  input  to  the  nodes.  These  are  variable 
vectors  that  are  the  size  of  the  sample  population.  So,  the  output  of  the  node  is  a  vector  of  the 
predicted  values  from  the  regression 

The  polynomial  used  for  the  regression  is  a  first  order  or  second  order  polynomial.  For 
instance,  a  second  order  basis  polynomial  would  be: 


k.n  k.n  ,  k,n  k.n  ,  k.n  k,n  .  k.n  k,n  k.n  .  k,n  k.n  k.n  ,  k.n  k,n  k.n 

yitJ  =  a0  +  x{  +  a2  x  j  +  a}  x{  x  ■  +  aA  xt  x{  +  a5  xj  xj 


where 

i  -  1,2,K  , number  of  inputs  to  given  layer 
j  =  1,2,K  , number  of  inputs  to  given  layer 
k  =  current  layer 

n  =  node  number  at  current  layer 


(22) 


The  output  of  the  node  is  the  vector  predicted  y  values  for  the  given  input.  The  output  of  a  node 
in  layer  /c-1  becomes  the  input  (provides  an  x,  vector)  for  layer  k. 

The  notation  in  equation  one  is  designed  to  inform  the  reader  that  the  functions  and 
polynomial  coefficients  pertain  to  a  particular  layer  and  particular  node  on  the  layer.  The 
notation  should  also  give  the  reader  a  feel  for  the  computational  resources  needed  to  create 
and  maintain  a  multilayer  self-organizing  model. 

Figure  19  shows  a  possible  multilayer  surrogate  model  for  a  three  variable  engineering 
model.  In  the  bottom  layer  (the  zero  layer)  actual  design  variables  are  the  nodes  in  the  layer. 
These  become  the  x  inputs  to  layer  1.  The  nodes  for  layer  1  are  created  by  permuting  the  input 
variables  and  performing  least  squares  fit  using  Equation  22  and  the  actual  response  from  the 
actual  function  (i.e. ,  objective  function,  engineering  simulation,  etc.).  Once  layer  1  is  made  layer 
2  is  created  but,  now  the  nodes  of  layer  1  provide  the  x's  to  make  the  new  nodes  using  Equation 
1.  Now  when  layer  3  is  to  be  made  the  3rd  node  of  layer  2  is  not  included.  For  now,  we  will  just 
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say  that  the  results  of  that  regression  were  not  good  enough  to  be  used  to  make  the  3rd  layer. 
Since  only  two  nodes  from  layer  2  were  used  to  make  layer  three,  only  one  node  can  be  created 
in  layer  three  and  the  process  of  making  models  ends  there. 


Figure  19. 


Layer  3 


Layer  2 


Layer  1 


Input  Variables 


Example  of  a  three-layer  model  for  a  multi-layer  self-organizing  algorithm. 


The  third  node  in  layer  2  is  not  used  to  make  nodes  in  layer  3  in  Figure  19.  A  selection  criterion 
is  used  to  determine  if  the  information  in  a  node  get  passed  on  to  the  next  layer.  Madala  and 
Ivakhnenko  [34]  suggest  that  using  the  following  equation  is  an  appropriate  means  for  checking 
the  quality  of  a  node  and  can  be  a  selection  criterion. 


K.V-A;  (23) 

pe  _ 

X  (yP  -y)2 

pzNB 

Where  : 

y  =  desired  values 

^  =  the  predicted  values 

y  —  the  mean  of  the  desired  values 

In  the  multilayer  algorithm  a  threshold  is  set  for  the  maximum  acceptable  value  of  Eq.  (23). 
Nodes  that  are  within  the  threshold  are  passed  on  to  the  next  layer.  For  each  new  layer  the 
threshold  is  made  smaller.  This  serves  to  minimize  the  amount  of  nodes  in  each  layer  to  only 
the  information  that  is  needed  to  improve  the  network.  This  trimming  of  nodes  is  crucial  to 
keeping  the  method  compact  and  efficient. 

The  preliminary  results  for  the  hybrid  model,  polynomial  basis  only,  combined  with  and  RBF 
fit  of  the  model  residuals  is  presented  for  the  295  test  cases  of  Schitkowski  [32,  33], 
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Hybrid  Multi-Layer  Self-Organizing  Model  with  RBF  on  Residual  Tlm«  to  Model  Problems  -  Scercs  Deta  Set  -  Hybrid  Model  with  RBF  on  Rendu 


Hybrid  Multi-Layer  Self-Organizing  Model  with  RBF  on  Residuals  Time  to  Model  Problems  -  Sms#  Det#  Set  -  Hybrid  Model  with  RBF  on  Resldui 


Hybrid  Multi-Layer  Self-Organizing  Model  with  RBF  on  Residuals  Time  to  Model  Problems  -  Medium  Dete  Set  -  Hybnd  Model  with  RBF  on  Residi 


The  self-organizing  multi-level  fitting  concept  for  multi-dimensional  response  surfaces  appears 
to  be  suitable  for  very  large  dimensional  response  surfaces  where  the  objective  function 
depends  on  thousands  of  design  variables.  For  smaller  dimensionality  response  surfaces,  this 
algorithm  is  too  costly  and  cumbersome.  The  option  of  the  multi-layer  self-organizing  algorithm 
that  uses  lower  level  (quadratic)  local  fits,  and  then  uses  a  polynomial  of  radial  basis  functions 
to  fit  the  errors  of  the  quadratic  local  fits,  offers  the  highest  accuracy. 
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Uncertainty  Evaluation  using  Bayesian  Statistics 

The  algorithmic  methods  for  the  solution  of  inverse  problems  could  be  grouped  into  two 
basic  approaches:  pure  inverse  methods  and  optimization-based  methods.  That  is,  in  most  of 
the  iterative  methods  for  the  solution  of  inverse  problems,  sophisticated  regularization 
formulations  are  used  in  order  to  prevent  numerical  errors  from  growing  exponentially.  In  many 
other  methods,  different  optimization  algorithms  are  used  to  solve  de  facto  inverse  problems  by 
minimizing  typically  least-squares  norms  resulting  from  such  algorithms.  For  an  example,  see 
Hernandez  et  al.  [36].  A  variety  of  techniques  is  currently  available  for  the  solution  of  inverse 
problems.  However,  one  common  approach  relies  on  the  minimization  of  an  objective  function 
that  generally  involves  the  squared  difference  between  measured  and  estimated  variables,  such 
as  the  least-squares  norm,  as  well  as  some  kind  of  regularization  term.  Despite  the  fact  that  the 
minimization  of  the  least-squares  norm  is  indiscriminately  used,  it  only  yields  maximum 
likelihood  estimates  if  the  following  statistical  hypotheses  are  valid: 

a)  the  errors  in  the  measured  variables  are  additive,  uncorrelated,  normally  distributed,  with 
zero  mean  and  known  constant  standard-deviation; 

b)  only  the  measured  variables  appearing  in  the  objective  function  contain  errors;  and 

c)  there  is  no  prior  information  regarding  the  values  and  uncertainties  of  the  unknown 
parameters. 

Although  very  popular  and  useful  in  many  situations,  the  minimization  of  the  least-squares 
norm  is  a  non-Bayesian  estimator.  A  Bayesian  estimator  is  concerned  with  the  analysis  of  the 
posterior  probability  density ,  which  is  the  conditional  probability  of  the  parameters  given  the 
measurements,  while  the  likelihood  is  the  conditional  probability  of  the  measurements  given  the 
parameters. 

If  we  assume  the  parameters  and  the  measurements  to  be  independent  Gaussian  random 
variables,  with  known  means  and  covariance  matrices,  and  that  the  measurement  errors  are 
additive,  a  closed  form  expression  can  be  derived  for  the  posterior  probability  density.  In  this 
case,  the  estimator  that  maximizes  the  posterior  probability  density  can  be  recast  in  the  form  of 
a  minimization  problem  involving  the  maximum  a  posteriori  objective  function. 

On  the  other  hand,  if  different  prior  probability  densities  are  assumed  for  the  parameters,  the 
posterior  probability  distribution  does  not  allow  an  analytical  treatment.  In  this  case,  Markov 
Chain  Monte  Carlo  (MCMC)  methods  are  used  to  draw  samples  of  all  possible  parameters,  so 
that  inference  on  the  posterior  probability  becomes  inference  on  the  samples.  As  such,  the 
number  of  samples  required  to  accurately  approximate  the  posterior  distribution  is  generally 
large,  resulting  in  prohibitive  computational  costs  for  many  practical  applications.  Such  is 
specially  the  case  when  the  solution  of  the  forward  problem,  which  is  needed  for  the 
computation  of  the  likelihood  function,  requires  large  computational  times. 

In  this  work,  we  examined  the  use  of  radial  basis  functions  to  interpolate  the  likelihood 
function,  in  order  to  reduce  the  computational  cost  of  MCMC  methods  in  the  Bayesian  approach 
of  solution  of  inverse  problems.  The  likelihood  function  is  interpolated  in  the  space  of  all 
possible  parameters,  by  using  a  small  number  of  solutions  of  the  forward  model  as  compared  to 
that  required  for  the  implementation  of  the  MCMC  methods.  Hence,  the  interpolated  likelihood 
function,  instead  of  the  actual  function,  is  used  afterwards  in  the  sampling  procedure  of  the 
MCMC  method,  providing  a  substantial  reduction  of  computational  costs. 

Although  very  popular  and  useful  in  many  situations,  the  minimization  of  the  least-squares 
norm  is  a  non-Bayesian  estimator.  A  Bayesian  estimator  is  concerned  with  the  analysis  of  the 
posterior  probability  density,  which  is  the  conditional  probability  of  the  parameters  given  the 
measurements,  while  the  likelihood  is  the  conditional  probability  of  the  measurements  given  the 
parameters  [37-40]. 
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Markov  Chain  Monte  Carlo  (MCMC)  Methods 

In  the  Bayesian  approach  to  statistics,  an  attempt  is  made  to  utilize  all  available  information 
in  order  to  reduce  the  amount  of  uncertainty  present  in  an  inferential  or  decision-making 
problem.  As  new  information  is  obtained,  it  is  combined  with  any  previous  information  to  form 
the  basis  for  statistical  procedures.  The  formal  mechanism  used  to  combine  the  new  information 
with  the  previously  available  information  is  known  as  Bayes’  theorem  [40],  Therefore,  the  term 
Bayesian  is  often  used  to  describe  the  so-called  statistical  inversion  approach,  which  is  based 
on  the  following  principles  [38]: 

1.  All  variables  included  in  the  model  are  modeled  as  random  variables. 

2.  The  randomness  describes  the  degree  of  information  concerning  their  realizations. 

3.  The  degree  of  information  concerning  these  values  is  coded  in  probability  distributions. 

4.  The  solution  of  the  inverse  problem  is  the  posterior  probability  distribution. 

Consider,  for  the  sake  of  generality,  the  vector  of  parameters  appearing  in  the  physical  model 
formulation  as 


Pr  =  [Pi,P2 . PN) 


(24) 


and  the  vector  of  available  measurements  as 


Y 


(25) 


where  N  is  the  number  of  parameters  and  /is  the  number  of  measurements.  Bayes’  theorem 
can  then  be  stated  as  [38]: 


(P)  =  *(P|V)  =  ^r°r(J(Y)(Y|P)  (26) 

where  xpoaterioJlP)  is  the  posterior  probability  density,  that  is,  the  conditional  probability  of  the 
parameters  P  given  the  measurements  Y;  nphor(P)  is  the  prior  density,  that  is,  the  coded 
information  about  the  parameters  prior  to  the  measurements;  zz(Y|P)  is  the  likelihood  function, 
which  expresses  the  likelihood  of  different  measurement  outcomes  Y  with  P  given;  and  My)  is 
the  marginal  probability  density  of  the  measurements,  which  plays  the  role  of  a  normalizing 
constant. 

In  practice,  such  normalizing  constant  is  difficult  to  compute  and  numerical  techniques,  such 
as  Markov  Chain  Monte  Carlo  Methods,  are  required  in  order  to  obtain  samples  that  accurately 
represent  the  posterior  probability  density.  In  order  to  implement  the  MCMC,  a  density  function 
q(P*,P  1J)  is  required  that  gives  the  probability  of  moving  from  the  current  state  in  the  chain  P 
to  a  new  state  P*. 

The  Metropolis-Hastings  algorithm  was  used  in  this  work  to  implement  the  MCMC  method.  It 
can  be  summarized  in  the  following  steps  [38-40]: 

1. Sample  a  Candidate  Point  P*  from  a  jumping  distribution  g(P*,  P(M>). 

2.  Calculate: 
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a  -  mm 


(27) 


,  ;r(P,|Y)?(P('-|),P’) 

’/r(P,'-|)|Y)<7(P\P,'-,))_ 

3.  Generate  a  random  value  L/  that  is  uniformly  distributed  on  (0,1). 

4.  If  U  <«define  P-P*,  otherwise,  define  P-  P(M). 

5.  Return  to  step  1  in  order  to  generate  the  sequence  (P! ,  P  . P"}. 

In  this  way,  we  get  a  sequence  that  represents  the  posterior  distribution  and  inference  on 

this  distribution  is  obtained  from  inference  on  the  samples  {P  ,  P  . Pn}.  We  note  that  values 

of  P7  must  be  ignored  until  the  chain  has  not  converged  to  equilibrium.  For  more  details  on 
theoretical  aspects  of  the  Metropolis-Hastings  algorithm  and  MCMC  methods,  the  reader  should 
consult  references  [38-40[. 

We  assume  in  this  work  that  the  errors  in  the  measured  variables  are  additive,  uncorrelated, 
normally  distributed,  with  zero  mean  and  known  constant  standard  deviation.  Hence,  the 
likelihood  function  is  given  by  [37-40] 


/r(Y|P)  =  (2/r)_/  2 1 W'1)"'  expj -y[Y-T(P)]r \V[Y -T(P)]j  (28) 

where  T  is  the  vector  of  estimated  variables,  obtained  from  the  solution  of  the  forward  model 
with  an  estimate  for  the  parameters  P,  and  W  is  the  inverse  of  the  covariance  matrix  of  the 
measurements. 

An  example  of  application  of  this  combined  algorithmic  approach  is  described  below  as 
applied  to  the  inverse  problem  of  estimating  the  three  thermal  conductivity  components  of  an 
orthotropic  solid  [41-43]. 

As  an  example  for  the  proposed  research,  the  physical  problem  considered  here  involves 
the  three-dimensional  linear  heat  conduction  in  an  orthotropic  solid,  with  thermal  conductivity 
components  A,* ,  A:,*  and  Aj  in  the  x*,  y*  and  z*  directions,  respectively.  The  solid  is  a 
parallelepiped  with  sides  a*,  b*  and  c*,  initially  at  temperature  T’ .  For  times  t*>  0,  uniform  heat 
fluxes  q\(t) ,  q'2(t )  and  qj(/)  are  applied  at  the  surfaces  x*  =  a*,  y*  =  b*  and  z*  =  c*,  respectively. 

The  other  three  remaining  surfaces  at  x*  =  0,  y*  =  0  and  z*  =  0  are  maintained  at  a  constant 
temperature  (equal  to  the  initial  temperature).  The  mathematical  formulation  of  this  type  of 
physical  problem  is  given  in  dimensionless  form  by  the  following  parabolic  partial  differential 
equation  where  T(x,y,z;t)  is  the  unsteady  three-dimensional  temperature  field: 


d2  T  .  a2  T 
k\  2  +*2  "7 

o  d  y~ 

^2  y*  0  j 

+  - —  = -  inOcxCtf,  0<y</>,0<z<c;/>0 

3  z1  dt 

(29.a) 

T  =  0  at  jc  =  0  ; 

d  T 

k\  -r—  =  q\(t)  at.v  =  a  , 
a  .r 

for  /  >  0 

(29.b,c) 

T  -  0  at  j;  -  0  ; 

^  j 

k2  —  =  q20)aly  =  b  , 
dy 

for  /  >  0 

(29.d,e) 

T  =  0  at  z  =  0  ; 

0  J 

ki  —  =  q3(t)alz=c  , 

for  /  >  0 

(29.f,g) 

7=0  for  f  =  0 

;  in0<x<a,0<y< 

b  ,  0  <  Z  <  C 

(29. h) 

In  the  direct  problem  associated  with  the  physical  problem  described  above,  the  three 
thermal  conductivity  components  k 1t  k2  and  k3,  as  well  as  the  solid  geometry,  initial  and 
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boundary  conditions,  are  known.  The  objective  of  the  direct  problem  is  then  to  determine  the 
transient  temperature  field  T(x,  y,  z,  t)  in  the  body. 

We  assume  the  boundary  heat  fluxes  to  be  pulses  of  finite  duration  th ,  that  is, 


9y(0  = 


I  <7,  .  for 

jo  ,  for 


0  <l<th 
l>'h 


for  j  -  1,2,3  (30) 


where  qt  is  the  dimensionless  magnitude  of  the  applied  heat  flux  during  the  heating  period  0  <  t 

<th. 

The  solution  of  direct  problem  given  by  equation  (1 )  for  0  <  t  <  th  can  be  obtained  with  the 
Classical  Integral  Transform  Technique  as 


T(x,y,z,t)m-l-YJ^e(Am,x)  n  T  y0)  (1  ^  ^  (31) 

a  b  c  —  —  — 


o=\  n=\  m=\ 


where 


/  ”  /  i\W  +  l  /  |\0+1 

(~t)  9i  ,  (-0  <h  |  (-0  <73 


„  ..  s  _  A  Yo  KYo  4  Pn 
ru~p"rJ - — 

and  the  eigenfunctions  are  given  by 

G(A.m,x)  =  sin(Am  x) 

n(Pn>y)  =  si"(P„  y) 

'Y{y0,:)  =  sm(yu  z) 

with  eigenvalues 

(2m-l)_  (2/7-1)  (2o-l)  _ 

^nt  _  ^  1  Al  _  .  ft  •>  Yo  -  ft 


2a 


2b 


2c 


;  m,n,o  =  1,2,. 


(32) 


(33) 


(34) 


For  the  inverse  problem  considered  here,  the  thermal  conductivity  components  /q,  k2  and  k3 
are  regarded  as  unknown,  while  the  other  quantities  appearing  in  the  formulation  of  the  direct 
problem  described  above  are  assumed  to  be  known  with  high  degree  of  accuracy.  The  vector  of 
unknown  parameters  PT  =  [/q,  k2 ,  k3]  is  estimated  by  using  the  Bayesian  technique  described 
next. 

Simulated  temperature  measurements,  obtained  with  the  solution  of  the  direct  problem  with 
m  =  n  =  o  =  50,  were  used  in  the  inverse  analysis.  This  number  of  terms  in  the  series  (4. a)  was 
sufficient  to  achieve  the  desired  convergence  in  the  solution.  In  order  to  avoid  the  inverse  crime 
of  using  the  same  solution  of  the  direct  problem  in  the  generation  of  the  simulated 
measurements  and  in  the  solution  of  the  inverse  problem  [38],  for  the  application  of  the 
Metropolis-Hastings  algorithm  we  used  m  =  n  =  o  =  20  in  the  series-solution.  The  same 
optimally  experimental  variables  selected  in  [41]  were  used  here:  for  /Ci  =  1,  k2  =  15  and  k3=  15, 
we  have  b/a  =  c/a  =  q2/<?i  =  PpQi  =  3.87  and  the  heating  and  final  times  were  th  =  tf  =  1. 

Results  are  presented  below  for  the  estimation  of  the  thermal  conductivity  components  PT  = 
[/q,  k2  ,  k3]  with  the  Metropolis  -  Hastings  algorithm,  by  using  20000  samples.  A  uniform 
distribution  was  used  as  prior  information  for  the  thermal  conductivity  components.  The 
unknowns  were  assumed  to  be  in  the  broad  physically  meaningful  intervals  given  by: 
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0.1  <  h  <  50 


0.1  <k2<50 


0.1  <  k3  <  50 


Notice  that  the  prior  distributions  assumed  for  the  parameters  are  non-informative,  that  is, 
the  intervals  in  which  the  parameters  are  uniformly  distributed  encompass  most  of  engineering 
materials,  ranging  from  mild  insulators  to  metals.  Therefore,  this  represents  a  very  difficult  test 
case  for  the  solution  of  the  inverse  problem,  especially  when  it  is  taken  into  account  that  thermal 
conductivity  values  used  to  generate  the  simulated  measurements  are  quite  different  in  the 
three  directions. 

Figures  26. a, b  present  the  exact  and  estimated  temperatures,  as  well  as  the  simulated 
measurements,  at  the  center  of  the  heated  surface  at  x  =  a,  for  standard  deviations  of  the  errors 
of  cr  =  0.01  and  a  =  0.05,  respectively.  The  estimated  temperatures  were  obtained  with  the 
solution  of  the  direct  problem  with  the  mean  values  estimated  for  the  parameters.  It  should  be 
noticed  in  figures  26. a, b  that  the  estimated  temperatures  are  in  excellent  agreement  with  the 
exact  ones,  even  for  very  large  measurement  errors  as  for  cr=  0.05  (see  figure  26.b). 


Table  2.  Results  obtained  with  the  MCMC  method 


Standard  Deviation  for 
the  measurements 

Parameter 

Mean 

Standard 

Deviation 

(T—  0.01 

ki 

0.983 

0.004 

k2 

14.74 

0.06 

k3 

15.04 

0.05 

<7=  0.05 

*i 

0.99 

0.02 

k2 

14.8 

0.2 

kz 

14.7 

0.3 
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The  mean  values  for  the  parameters,  as  well  as  their  correspondent  standard  deviations 
were  obtained  from  the  posterior  distribution  generated  with  the  Metropolis-Hastings  algorithm 
(see  table  2).  This  table  shows  that  the  MCMC  Bayesian  approach  with  the  Metropolis-Hastings 
algorithm  provided  very  accurate  estimates  for  the  unknown  thermal  conductivity  components, 
even  for  very  large  magnitudes  of  the  measurement  errors,  such  as  a  -  0.05.  As  expected,  the 
standard  deviations  of  the  estimated  parameters  increase  with  the  magnitude  of  the 
measurement  errors. 

The  states  of  the  Markov  chain  resulting  from  the  application  of  the  Metropolis-Hastings 
algorithm,  for  the  cases  with  standard  deviation  of  the  measurement  error  of  o-  0.05  show  that 
the  posterior  distribution  for  the  parameters  after  the  Markov  chain  reaches  equilibrium 
resembles  that  for  a  linear  estimation  problem,  i.e.,  an  ellipsoid.  Such  is  the  fact  despite  the 
non-linearity  of  the  present  estimation  problem  and  is  due  to  the  use  of  optimized  experimental 
variables  in  the  estimation  procedure  [41].  A  burn-in  period  is  required  for  the  MCMC  to  reach 
equilibrium,  where  the  samples  come  from  their  starting  states  to  gradually  form  the  posterior 
distribution  for  the  parameters.  As  expected,  the  posterior  distribution  is  more  spread  for  larger 
measurement  errors  which  is  apparent  from  the  analysis  of  figures  27,  which  present  the  states 
of  the  Markov  chain  for  each  parameter  (marginal  distributions)  for  a=  0.05.  The  burn-in  period 
for  <j=  0.01  was  of  the  order  of  4000  states,  while  for  o-  0.05  the  burn-in  period  was  of  the 
order  of  2000  states.  The  mean  values  and  the  correspondent  standard  deviations  shown  in 
table  2  were  computed  by  discarding  the  samples  during  the  burn-in  period. 


Although  an  order  of  magnitude  computationally  more  expensive  than  Kalman  filters,  it  is 
highly  advisable  to  use  MCMC  with  the  Metropolis-Hastings  algorithm  especially  when  dealing 
with  non-linear  problems,  since  this  approach  offers  higher  accuracy  and  robustness. 
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